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DOES MEDIAN FILTERING TRULY PRESERVE EDGES 
BETTER THAN LINEAR FILTERING? 

By Ery Arias-Castro and David L. Donoho 

University of California, San Diego and Stanford University 

Image processing researchers commonly assert that "median fil- 
tering is better than linear filtering for removing noise in the presence 
of edges." Using a straightforward large-n decision-theory framework, 
this folk-theorem is seen to be false in general. We show that median 
filtering and linear filtering have similar asymptotic worst-case mean- 
squared error (MSE) when the signal-to-noise ratio (SNR) is of order 
1, which corresponds to the case of constant per-pixel noise level in 
a digital signal. To see dramatic benefits of median smoothing in an 
asymptotic setting, the per-pixel noise level should tend to zero (i.e., 
SNR should grow very large). 

We show that a two-stage median filtering using two very different 
window widths can dramatically outperform traditional linear and 
median filtering in settings where the underlying object has edges. 
In this two-stage procedure, the first pass, at a fine scale, aims at 
increasing the SNR. The second pass, at a coarser scale, correctly 
exploits the nonlinearity of the median. 

Image processing methods based on nonlinear partial differential 
equations (PDEs) are often said to improve on linear filtering in the 
presence of edges. Such methods seem difficult to analyze rigorously 
in a decision-theoretic framework. A popular example is mean cur- 
vature motion (MCM), which is formally a kind of iterated median 
filtering. Our results on iterated median filtering suggest that some 
PDE-based methods are candidates to rigorously outperform linear 
filtering in an asymptotic framework. 

1. Introduction. 

1.1. Two folk theorems. Linear filtering is fundamental for signal pro- 
cessing, where often it is used to suppress noise while preserving slowly 
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varying signal. In image processing, noise suppression is also an important 
task; however, there have been continuing objections to linear filtering of 
images since at least the 1970s, owing to the fact that images have edges. 

In its simplest form, linear filtering consists of taking the average over a 
sliding window of fixed size. Indeed, linear filtering with fixed window size 
h "blurs out" the edges, causing a bias of order O(l) in a region of width h 
around edges. This blurring can be visually annoying and can dominate the 
mean-squared error. 

Median filtering — taking the median over a sliding window of fixed size — 
was discussed already in the 1970s as a potential improvement on linear 
filtering in the "edgy" case, with early work of Matheron and Serra on 
morphological filters [24, 33] in image analysis and (in the case of 1-d signals) 
by Tukey and collaborators [22, 37, 38]. 

To this day simple median filtering is commonly said to improve on linear 
filtering in "edgy" settings [2, 8, 14, 16, 36] — such a claim currently appears 
in the Wikipedia article on median filtering [1]. Formally, we have the 

Median folk theorem. Median filtering outperforms linear filtering 
for suppressing noise in images with edges. 

Since the late 1980s, concern for the drawbacks of linear filtering of images 
with edges has led to increasingly sophisticated proposals. In particular, 
inspired by seminal work of Mumford and Shah [26] and Perona and Malik 
[27], a whole community in applied mathematics has arisen around the use 
of nonlinear partial-differential equations (PDEs) for image processing — 
including noise suppression [25, 31, 34]. 

A commonly heard claim at conferences in image processing and in applied 
mathematics boils down to the following: 

PDE FOLK THEOREM. PDE-based methods outperform linear filtering 
for suppressing noise in images with edges. 

1.2. A challenge to asymptotic decision-theory. While these folk theo- 
rems have many believers, they implicitly pose a challenge to mathematical 
statisticians. 

Linear filtering of the type used in signal and image processing has also 
been of interest to mathematical statisticians in implicit form for several 
decades. Indeed, much of nonparametric regression, probability density es- 
timation and spectral density estimation is in some sense carried out with 
kernel methods — a kind of "linear filter" — and there is extensive literature 
documenting the optimality of such linear procedures in certain cases. In 
many cases, the correspondence of the underlying bias- variance analysis with 
the kind of analyses being done in signal processing is quite evident. 
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During the last two decades, mathematical statisticians have succeeded 
in showing that for models of images with edges, nonlinear methods can 
indeed outperform linear ones in a minimax sense [5, 10, 11]. However, the 
nonlinear methods which have been analyzed fully rigorously in a decision- 
theoretic framework are somewhat different than the median and PDE cases 
above; examples include methods of wavelet shrinkage and other harmonic 
analysis techniques [5, 9, 10, 11]. 

So within the decision-theoretic framework it is rigorously possible to do 
better than linear filtering, but by methods somewhat different than those 
covered by the folk theorems above. 

The great popularity of median- and PDE-based nonlinear filtering prompted 
us to evaluate their performance by a rigorous approach within the decision- 
theoretic framework of mathematical statistics. Three conclusions emerge: 

• The Median-filtering folk theorem is false in general. 

• In an apparently meaningless special case — where the noise level per pixel 
is negligible — the Median-filtering folk theorem is true. 

• A modified notion of median filtering — applying two passes in a multiscale 
fashion — does improve on linear filtering, as we show here. 

Before explaining these conclusions in more detail, we make a few remarks. 

• Tukey's emphasis with median filters was always on iterating median fil- 
ters sequentially applying medians over windows of different widths at 
different stages, as we do here. We believe that Tukey's intuition about 
the benefits of median filtering actually applied to this iterated form; but 
this intuition was either never formalized in print or has been forgotten 
with time. 

• The iterated median scheme we are able to analyze in this paper simply 
involves two passes of medians at two very different scales. 

Finally, we believe our results are part of a bigger picture: 

• There is a formal connection between certain nonlinear PDEs used for 
image processing (i.e., Mean Curvature Motion and related PDEs) and 
iterated medians. 

• Nonlinear PDE-based methods in the form usually proposed by applied 
mathematicians seem quite difficult to analyze within the decision-theoretic 
framework; this seems a looming challenge for mathematical statisticians. 

• Our iterated median scheme seems related to such PDE-based methods. It 
is perhaps less elegant than full nonlinear PDEs but is rigorously analyzed 
here. 

• Because of results reported here, we now suspect some subset of the PDE 
folk theorem may well be true. 
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1.3. The framework. We describe the framework below in any dimension 
d; however, this paper focuses on dimensions d = 1 (signal processing) and 
d = 2 (image processing) . 

Consider the classical problem of recovering a function / : [0, l] d i-> [0, 1] 
from equispaced samples corrupted by additive white noise. Here we observe 
Y n = (Y n (i)) defined by 

(1.1) Y n (i) = f(i/n)+aZ n (i), iel* 

where I n = {1, . . . ,ra}, d is the dimension of the problem (here, d = 1 or 2), 
a > is the noise level and Z n is white noise with distribution Only Y n , 
a and are known. Though unknown, / is restricted to belong to some 
class T of functions over [0, l] d with values in [0, 1]; several such T will be 
explicitly defined in Sections 2, 3 and 4. 

By linear filtering we mean the following variant of moving average. Fix 
a window size h > and put 

L h [Y n ](i) = Average{Y n (j) :j G W[n,/i](i)}; 

here W[n,/i](i) denotes the discrete window of radius nh centered at i G I~: 

W[n,h](i) = {ieli:\\j-i\\<nh}. 

Similarly, by median filtering we mean 

M h [Y n ](i) = Median{Y n (j) :j G W[n,h](i)}. 

(More general linear and median filters with general kernels add no dramat- 
ically new phenomena in settings of interest to us, i.e., where signals are 
discontinuous, so we ignore them.) 

Following a traditional approach in mathematical statistics [21], the per- 
formance of an estimator T[Y^] is measured according to its worst-case risk 
over the functional class of interest T with respect to mean-squared error 
(MSE): 

TZ n (T;J r ) = sup K n (T;f), 

where 

n n(T; f) = -L £ E[(T[Y n ](i) - /(i/n)) 2 ]. 

We consider for T certain classes of piecewise Lipschitz functions. When 
the noise distribution is sufficiently nice^ — Gaussian, for example — we show 
that linear filtering and median filtering have worst-case risks with the same 
rates of convergence to zero as n — > oo. This contradicts the Median folk 
theorem. 

Our conclusion does not rely on misbehavior at any farfetched function 
/ G T . In dimension d=l, linear filtering and median filtering exhibit the 
same worst-case rate of convergence already at the simple step function 
f(x) = l{x>i/2} — the simplest model of an edge. 
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Fig. 1. The dash-dotted (black) line represents the noiseless object f = l[i/2,i] ! the dashed 
(green) line represents the expected value of linear filtering; the solid (blue) line represents 
the expected value of median filtering. The noise is Gaussian, the smoothing window size 
is h — 0.125 and the sample size n = 512. Only at very small a is the bias of the median 
qualitatively superior to the bias of linear filtering. 



1.4. The underlying phenomenon. The misbehavior of median filtering 
can be traced to the fact that, for a signal-to-noise ratio of order 1 (specifi- 
cally, for a = 1), its bias is of order 1 in a region of width h near edges; this 
behavior is virtually identical to that of linear filtering. Figure 1 illustrates 
this situation in panel (d). 
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However, the figure also illustrates, in panels (a), (b), (c), another phe- 
nomenon: for very low noise levels a, that is, very high signal-to-noise ratios, 
the bias of the median behaves dramatically differently than the bias of lin- 
ear filtering. In particular, the bias is not large over an interval comparable 
to the window width, but only over a much smaller interval. In fact, as 
a — > 0, the bias vanishes away from the edge. We call this the: 

True hope of median filtering. At very low noise levels, median 
filtering can dramatically outperform linear filtering. 

At first glance this seems utterly useless: why should we care to remove 
noise when there is almost no noise? On reflection, a useful idea emerges. 
Suppose we filter in stages, at the first stage using a relatively narrow window 
width — much narrower than we would ordinarily use in a one-stage process — 
and at the second stage using a somewhat wider window width. The result 
may well achieve the noise reduction of the combined two-stage smoothing 
with much smaller bias near edges. 

Heuristic of iterated median filtering. Iterated median filtering, 
in which the data are first median-filtered lightly, at a fine scale, followed 
by a coarse-scale median filter, may outperform linear filtering. 

Note that the same idea, applied to linear filtering, would achieve little. 
The composition of two linear filters can always be achieved by a single 
linear filter with appropriate kernel. And such weights do not change the 
qualitative effect of edges. 



Table 1 

Summary of results. Rates of convergence to zero of worst-case MSE, and of 
optimal window width, for different methods in dimensions d = l, 2 



Dimension 




d = l 




d = 2 


Technique 


Rate 


Window width 


Rate 


Window width 


Linear filter 
Median filter 
Two-scale median filter 
Edge-free optimal 


n~ 2 ' 3 
n- 2 '' A 


n- 2 '\ n- 1 '-' 


n~ 2/3 
-l 

n 


n^ 3 



In each case, the underlying class of functions has Lipschitz smoothness away from edges. 
Results compiled from theorems below. Note: here n is the signal width in pixels, not the 
sample size. The sample size is n d in dimension d. 
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1.5. Results of this paper. Table 1 compiles the worst-case risk rates for 
linear filtering, median filtering and two-scale median filtering over piece- 
wise Lipschitz function classes. The last line displays the minimax rates for 
Lipschitz functions without discontinuities. Note that the rates would be the 
same for classes of functions with higher degree of smoothness away from the 
edges — indeed, the three filtering methods considered here have worst-case 
risk of same order of magnitude as their MSE for a simple step function such 
as f(x) = l{ a .>i/2} hi dimension d = 1. 

Notice also that, in dimension d = 1 our two-scale median filtering achieves 
the minimax rate for edge- free Lipschitz function classes. This is not the case 
in dimension d = 2, where other methods are superior [5, 21]. 

1.6. Iterated medians. As mentioned above, in the late 1960s Tukey al- 
ready proposed the use of iterated medians, although the motivations re- 
mained unclear to many at the time. In his proposals, different scales were 
involved at different iterations, although the scales were relatively similar 
from the current viewpoint. 

Significantly, iterated median filtering converges in some sense to Mean 
Curvature Motion (MCM), a popular PDE-based technique. In fact, there 
is a wider link connecting PDE-based methods and iterative filtering; in 
particular, iterated linear filtering converges to the Heat equation. See, for 
example, [7, 8, 15]. Although MCM is highly nonlinear and hard to analyze, 
the heuristic above gives a hint that MCM might improve on linear filtering. 

1.7. Prior literature. Several papers analyze the performance of median 
filtering numerically using simulations. For example, in [17], the authors 
derive exact formulas for the distribution of the result of applying median 
filtering to a simple noisy edge like /o, and use computer-intensive simula- 
tions to provide numerical values. A similar approach is found in [19]. 

Closer to the present paper, [20] compares linear filtering and median fil- 
tering in the context of smooth functions and shows that they have minimax 
rates of same order of magnitude. We will show here that the same holds 
for functions with discontinuities. 

An extensive, but unrelated, body of literature explores the median's 
ability to suppress outliers, for example, [13] and also [28], which consider 
the case of smoothing a one-dimensional signal corrupted with impulsive 
noise. Using a similar framework, Donoho and Yu [12] study a pyramidal 
median transform. 

1.8. Contents. In Sections 2 and 3, we consider one-dimensional and two- 
dimensional signals, respectively, in the constant-noise level case. In Section 
4 we consider per-pixel noise level tending to 0. In Section 5, we introduce a 
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two-scale median filtering and formulate results quantifying its performance. 
The proofs are postponed to the latter sections. 

Though our results can be generalized to higher-dimensional signals and 
other smoothness classes, and can accommodate more sophisticated kernels, 
we choose not to pursue such extensions and generalizations here. 

1.9. Notation. Below, we denote comparable asymptotic behavior of se- 
quences (a n ),(b n ) G M N using o„ x 1„, meaning that the ratio a n /b n is 
bounded away from and oo as n becomes large. 

2. Linear filtering and median filtering in dimension 1. Consider the 
model (1.1) introduced in the Introduction for the case of dimension 1 (d = 
1). We now explicitly define the smoothness class of interest. 

Definition 2.1. The local Lipschitz constant of the function / at x is 

i->ip x {j ) = limsup sup j j . 

e— >0 \y— x\<e,y^x \U ~ x \ 

A function / : [0, 1] i— > R will be called essentially local Lipschitz if the es- 
sential supremum of the local Lipschitz constant on [0, 1] is finite. 

The function xl{ x> i/2}(x) is essentially local Lipschitz but not Lipschitz; 
it has a local Lipschitz constant < 1 almost everywhere on [0, 1], but jumps 
across the line x = 1/2. More generally, piecewise polynomials without con- 
tinuity constraints at piece boundaries are essentially local Lipschitz and yet 
neither Lipschitz nor continuous. 

Definition 2.2. Fix > 1 and [3 > 0. The class of punctuated-Lipschitz 
functions pLip = pLip(/3, N) is the collection of functions / :[0,1] h-> [0,1] 
with local Lipschitz constant bounded by (3 on the complement of some 
finite set (xi)f =1 C (0,1). 

Theorem 2.1. Assume has mean and finite variance. Then, 
inf 1Z n (Lh; pLip) >c rcT 1 / 2 , n^oo. 

This result can be proven using standard bias- variance trade-off ideas, and 
needs only simple technical ingredients such as uniform bounds on functions 
and derivatives. We explain in Section 7 that it can be inferred from existing 
results, but then proceed to give a proof; this proof sets up a bias-variance 
trade-off framework suitable for several less elementary situations which 
come later. 
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To analyze the behavior of median filtering, we must obtain uniform 
bounds on the stochastic behavior of empirical quantiles; these are laid out 
in Section 15 below. To enable such bounds, we make the following assump- 
tions on the noise distribution VP: 

[Shape] VP has density ip with respect to the Lebesgue measure on R, with 
tp unimodal, continuous and symmetric about 0. 

[Decay] £(*) :=sup{s >0:^(x)(\x\ + l) s is bounded} > 1. 

Note that the normal, double-exponential, Cauchy and uniform distribu- 
tions centered at all satisfy both [Shape] and [Decay]. These conditions 
permit an efficient proof of the following result — see Section 8; the conditions 
could probably be relaxed considerably, leading to a more difficult proof. 

Theorem 2.2. Assume VP satisfies [Shape] and [Decay]. Then, 
inf ^(M^pLip) xra~ 1/2 , n^oo. 

For piecewise Lipschitz functions corrupted with white additive Gaus- 
sian noise (say) having constant per-pixel noise level, Theorems 2.1 and 2.2 
show that linear filtering and median filtering have risks of same order of 
magnitude lZ n X n _1//2 — the same holds for any noise distribution VP with 
finite variance satisfying [Shape] and [Decay]. In that sense, the Median folk 
theorem is contradicted. 

The proof shows that the optimal order of magnitude for the width h n 
of the median smoothing window obeys h n x n -1 / 2 — again the same as in 
linear filtering. 

3. Linear filtering and median filtering in dimension 2. Consider mod- 
el (1.1) of the Introduction in the case of dimension 2 (d = 2); our "signals" 
are now digital images. Our smoothness class here is a class of cartoon im- 
ages, which are piecewise functions that are smooth except for discontinuities 
along smooth curves — see also [6, 9, 21]. 

Just as in d = 1, we have the notion of local Lipschitz constant. In d = 2, 
the function xl{ a;>1 / 2 }(x,y) is essentially local Lipschitz but not Lipschitz; 
this has a local Lipschitz constant bounded by 1 almost everywhere, but 
the function jumps as we cross the line x = 1/2. More generally, cartoon 
images have the same character: essentially local Lipschitz and yet neither 
Lipschitz nor continuous. Such cartoon images, of course, have jumps along 
collections of regular curves; we formalize such collections as follows. 

Definition 3.1. A finite collection of rectifiable planar curves will be 
called a complex. Fix A > and let V = T(X) denote the class of rectifiable 
curves in [0, l] 2 with length at most A. Let C(N,\) denote the collection of 
complexes composed of at most N curves from T(A). 



10 



E. ARIAS-CASTRO AND D. L. DONOHO 



Definition 3.2. Fix N > 1 and j3 > 0. The class of curve-punctuated 
Lipschitz functions cpLip = cpLip(A, /?, N) is the collection of functions 
/ : [0, l] 2 i— > [0, 1] having local Lipschitz constant bounded by (3 on the com- 
plement of a C(N, A)-complex. 

Informally, such functions are "locally Lipschitz away from edges" and 
indeed can be viewed as models of "cartoons." We prove the next two results 
in Sections 10 and 11, respectively. 

Theorem 3.1. Assume has mean and variance 1. Then 
inf H n {L h ; cpLip) x n~ 2//3 , n — ► oo. 

?i>0 

Theorem 3.2. Assume \& satisfies [Shape] and [Decay]. Then, 
inf ^(M/^cpLip) x n~ 2/3 , ra^oo. 

h>0 

The situation parallels the one-dimensional case. In words, for cartoon im- 
ages corrupted with white additive noise with constant per-pixel noise level, 
linear filtering and median filtering have risks of same order of magnitude 
gain, the Median folk theorem is contradicted. 

The proofs show that the width of the optimal smoothing window for 
either type of smoothing is >c n -2 / 3 . 

4. Linear filtering and median filtering with negligible per-pixel noise 
level. The analysis so far assumes that the noise level is comparable to the 
signal level. 

For very low-noise-per-pixel level and discontinuities well-separated from 
the boundary and each other, the situation is completely different: the Me- 
dian folk theorem holds true. 

Preliminary remark. We will see in Sections 9 and 12 that, both in di- 
mensions d = 1 and d=2, linear filtering does not improve on no-smoothing 
if a n n 1 / 2 = 0(1), while median filtering improves on no-smoothing if a n n — > 
oo. In Theorems 4.1 and 4.2 below we therefore exclude the situation o n n = 
O(l). 

Definition 4.1. The finite point set c [0> 1] 1S called well-separated 

with separation constant r] > if (i) each point is at least ^-separated from 
the boundary {0, 1}: 

min(xj, 1 — Xi) > n \/i 
and (ii) each point is at least ^-separated from every other point: 

\xj — Xi\ > r\ Vi, j. 
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Definition 4.2. Let sep-pLip = SEP-pLip(r?,/3, N) denote the class of 
functions in pLip(/3, N) which have local Lipschitz constant < (3 on the 
complement of some 77- well-separated set {xi)f =1 . 

Theorem 4.1. Let ^ satisfy [Shape] and [Decay] and let the per-pixel 
noise level tend to zero with increasing sample size, o = o n — > as oo, 
with a n n — > oo . Then, 



inf lZ n (M h : SEP-pLip) = o inf T^fL/,; SEP-pLip) , n —> oo. 

h>0 \h>0 J 

The proof is in Section 9, where we provide more explicit bounds for the 
risks of linear and median filtering. 

In dimension d = 2, we again have that for negligible per-pixel noise level 
the Median folk theorem holds true. To show this, we need the hypothesis 
that the discontinuity curves are well-separated from the boundary, and from 
each other. 

Definition 4.3 (Well-separated complex). Let d(A,B) denote Hauss- 
dorff distance between compact sets A and B. A complex C = (7^) of recti- 
fiable curves in [0, l] 2 is said to be well-separated with separation parameter 
77 > 0, if (i) the curves are separated from the boundary of the square: 

d[^i, bdry[0, l] 2 ) > rj \/i 

and (ii) the curves are separated from each other: 

d(ji,"fj)>V Vi,j. 

We also need that the curves are well-separated from themselves (i.e., do 
not loop back on themselves). Formally, we need the condition 

Definition 4.4 (C 2 Chord-arc curves). Fix parameters A, k, 6. Let T2 = 
r2(A, k,6) be the collection of planar C 2 curves 7 with curvature bounded 
by k and chord-arc ratio bounded by 9: 

t — s „ , length(7) —t + s 

Vs<t — rTT <9 and , -. , . .. < 6. 

| 7 (t)- 7 ( s )|- |7(*)-7(«)I " 

Related classes of curves appear in, for example, Section 5.3 of [21]. Note that 
curves with bounded chord-arc ratio appear in harmonic analysis related to 
potential theory, for example, [32]. 

Definition 4.5. Let sep-cpLip = sep-cpLip(A, 9, k, rj, (3, N) be the col- 
lection of curve-punctuated Lipschitz functions with local Lipschitz constant 
bounded by (3 on the complement of an 77-well-separated C(N, A)-complex 
of k,9 chord-arc curves. 
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For an example, let / = ljj, where D is the disk of radius 1/4 centered at 
(1/2, 1/2). Then the exceptional complex C = {71}, where 

71 (0) = (|, \)' + |(cos(fl),sin(0))', 9 G [0, 2tt). 

For this example, / £ sep-cpLip with parameters 

N>1, A>~ /3>0, 0>|, k>4, 77 < |. 

As this example shows, we may choose parameters so that the classes T and 
SEP-cpLip are nonempty. In the sequel, we assume this has been done, so 
that sep-cpLip contains the / just given. 

Theorem 4.2. Assume satisfies [Shape] and [Decay] and that the 
per-pixel noise level tends to zero with increasing sample size, a = a n — > 
as n — > 00, with a n n — > 00. Then, 

inf lZ n (M h ; sep-cpLip) = of inf TZ n (L h ; sep-cpLip) ) , n —> 00. 

h>0 \h>0 J 

The proof is in Section 12. 

5. Iterated and two-scale median filtering. Iterated (or repeated) me- 
dian filtering applies a series of median filters ... h m \Xn] = ^h m ' ' ' 
M/ ll [Y n ]. That is, median filtering with window size h\ is first applied, then 
median filtering with window size /12 is applied to the resulting signal, and 
so on. In the 1970s Tukey advocated such compositions of medians in con- 
nection with d = 1 signals, for example, applying medians of lengths 3, 5, 
and 7 in sequence — possibly, along with other operations, including linear 
filtering. Here we are interested in much longer windows than Tukey, in fact 
in windows that grow large as n increases. 

Tukey also advocated the iteration of medians until convergence — his so- 
called "3R" median filter applies running medians of three repeatedly until 
no change occurs. The mathematical study of repeated medians Mh lt ... t h m 
is a challenging endeavor, however, because of the strong dependency that 
median filtering introduces with every pass, though [3, 4] attempt to carry 
out just such studies, in the situation where there is only noise and no signal. 
See also Rousseeuw and Bassett [29] . 

Here, inspired by the intuition supplied in Figure 1 and by the results of 
the previous section, we consider two-scale median filtering. The first pass 
aims at increasing the signal-to-noise ratio, so the second pass can exploit 
the promising characteristics of median filtering at high signal-to-noise ratio. 

We describe the process in dimension d. For h > 0, consider the squares 

Bfc = [kinh + 1, (h + l)nh) x • • • x [k d nh + 1, (k d + i)nh), 

where k = (hi, . . . , k d ) G N d with < k j < l/h. Fix < hi < J12 < 1 and define 
M hl ' h2 [Y n ] as follows: 
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1. For k= (ki,..., k d ) e N d with 0<kj< l/h u define 

Yt (k) = Median{Y n (i) :i G -B^ 1 }; 

thus y n hi is a coarsened version of Y n . 

2. For iGB^ 1 , define 

M h ^[Y n m = M h2 [Y^](k). 

In words, we apply median filtering to the coarse-scale version Y^ 1 and get 
back the fine-scale version by crude piecewise interpolation. The following 
results, for d=l,2, respectively, are proven in Sections 13 and 14. 

Theorem 5.1. Assume satisfies [Shape] and [Decay]. Then, 
inf 1IJ M hl ' h2 ;SEP-pLlp) xn" 2/3 , rw oo. 

0<h 1 <h 2 

Theorem 5.2. Assume ^ satisfies [Shape] and [Decay]. Then, 

inf n n (M hlM -.SEP-CPhTP) xn~ 6/7 , n -> oo. 
0<hi<h 2 nv ; 

Compare Theorems 5.1 and 5.2 with Theorems 2.2 and 3.2, respectively. In 
dimension d=l, the rate improves from 0(n~ 1 / 2 ) to 0(n~ 2 / 3 ); in dimension 
d = 2, from 0(n -2 / 3 ) to 0(n~ 6 / 7 ). Hence, with carefully chosen window 
sizes, our two-scale median filtering outperforms linear filtering. The optimal 
choices are h\ x n~ 2 / 3 and hi x n" 1 / 3 for d = 1; /ii x n~ 6 / 7 and /i2 x n _4//7 
for d = 2. 

Though this falls short of proving that indefinitely iterated median filter- 
ing of the sort envisioned by Tukey dramatically improves on linear filtering 
or that the PDE folk theorem is true for Mean Curvature Motion, it certainly 
suggests hypotheses for future research in those directions. 

6. Tools for analysis of medians. Before proceeding step-by-step with 
proofs of the theorems announced above, we isolate some special facts about 
medians which are used frequently and which ultimately drive our analysis. 

6.1. Elementary "properties. Let Med n (-) denote the empirical median of 
n numbers. We make the following obvious but essential observations: 

• Monotonicity. If Xi < yi, i = 1, . . . , n, 

(6.1) Med„Oi, . . . ,x n ) <Med n (y y n ). 

• Lipschitz mapping. 

(6.2) |Med n (xi, ...,x n )- Med n (yi, . . .,y n )\ < max|xi - yi\. 
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6.2. Bias bounds. Since Huber [18] the median is known to be optimally 
robust against bias due to data contamination. Such robustness is essential 
to our analysis of the behavior of median filtering near edges. In effect, data 
contributed by the "other side" of an edge act as contamination that the 
median can optimally resist. 

Consider now a composite dataset of n + m points made from Xj, i = 
l,...,n, yi, i = 1, . . . ,m. Think of the y's as "bad" contamination of the 
"good" Xi which may potentially corrupt the value of the median. How 
much damage can the y's do? Equation (6.1) yields the mixture bounds 

Med n+m (xi,...,x n ,-oo,...,-oo) < Med„ +m (xi, . . . 

> Vli ■ ■ ■ > Urn) 

(6.3) 

< Med n+m (xi,...,x n ,oo,...,oo). 

Observe that if m < n, then the median of the combined sample cannot be 
larger than the maximum of the x's nor can it be smaller than the minimum 
of the x's: 

Med n+m (xi ,...,x n , -oo, . . . , -oo) > min(x 1 , ...,x n ) 

and 

Med n+m (j;i , . . . , x n , oo, . . . , oo) < max(xi ,...,x n ). 

Generalizing this observation leads to bias bounds employing the empir- 
ical quantiles of xi,...,x n . Let F n (t) = re _1 #{i:xj < t} be the usual cu- 
mulative distribution function of the numbers (x^, and let F~ l denote the 
empirical quantile function. Set e = m/(m + re) and suppose e G (0, 1/2). As 
in [18], we bound the median of the combined sample by the quantiles of 
the "good" data only: 

(6.4) F~ 1 ( y^p j <Med n+m (x 1 ,...,x n ,y 1 ,...,y m )<F- 1 

This inequality will be helpful later, when the combined sample corresponds 
to all the data within a window of the median filter, the "good" data cor- 
respond to the part of the window on the "right" side of an edge, and the 
"bad" data correspond to the part of the window on the "wrong" side of the 
edge. 

6.3. Variance bounds for uncontaminated data. The stochastic proper- 
ties of the median are also crucial in our analysis; in particular we need 
bounds on the variance of the median of "uncontaminated" samples, that 

is, of the samples (Zj)^ 1; Z, 1 ~ ' The following bounds on the variance of 
empirical medians behave similarly to expressions for variances of empirical 
averages. 
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Lemma 6.1. Suppose ^ satisfies [Shape] and [Decay]. Then, there are 
constants C\,C2 depending only on C(^)> such that 

— <E[Med m (Z 1 ,...,Z m ) 2 ] <Q, m = l,2,.... 
m m 

Proof. In Section 15, we prove Lemma 15.1 which states that [Shape] 
and [Decay] imply a condition due to David Mason, allowing us to apply 
Proposition 2 in [23]. □ 

We also need to analyze the properties of repeated medians (medians 
of medians). Borrowing ideas of Rousseeuw and Bassett [29], we prove the 
following in Section 15. 

Lemma 6.2. Assume satisfies [Shape] and [Decay], and consider 
Zi, . . . , Z m a sample from ^S> . Let ^ m denote the distribution ofm 1 ^ 2 Median{Zi, 
. . . , Z m }. Then, for all m, ^ m satisfies [Shape] and [Decay]. More precisely, 
there is a constant C such that, for m large enough, ip m (x)(l + \x\) 4 < C for 
all x. 

6.4. Variance of empirical quantiles, uncontaminated data. Because of 
the bias bound (6.4) it will be important to control not only the empirical 
median, but also other empirical quantiles besides p = \- 

Let 

Zm.p denote the empirical p-quantile of Z\,...,Z m , a sample from 
\y. That is, with -Z(i), • • • , Z( m ) denoting order statistics of the sample, and 
0<p< 1, 

Z m ,p = -^(1+LmpJ)- 

Lemma 6.3. Fix £ > 1. Let ^ satisfy [Shape] and [Decay]. Define 

5C-3 



(6.5) a 



4 C -4' * /C>3 ' 
*K<3. 



C-L 

There is a constant C > such that, for all sufficiently large positive integer 
m and p £ (2a/m, 1 — 2a/m), 

E[zij<c( P (i- P )r 2a+2 . 

Proof. Again noting Lemma 15.1, we are entitled to apply Proposition 
2 in [23]. We then invoke Lemma 15.2. □ 

In words, provided that we do not consider quantiles p very close to the 
extremes and 1, the variance is well-controlled. The rate at which the 
variance blows up as p — > or 1 is ultimately determined by the value of 
£ > 1 and will be of crucial significance for some bounds below. 
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6.5. MSE lower bound for contaminated data. As a final key ingredi- 
ent in our analysis, we develop a simple lower bound on the mean-square 
displacement of the empirical median of contaminated data. Let 

~ Med n+m (Zi, ...,Z n , Z n+ i + A, . . . , + A). 

In words this is the empirical median of n + m values, the first n of which 
are "good" data with median 0, and the last m of which are contaminated, 
having median A. For A > and e £ (0, 1), define the mixture CDF 

(6.6) F eA {-) = (1 - £)*(•) + (1 - - A). 
Let fi = fj,(e, A) be the corresponding population median: 

Actually, A) is almost the population median of the empirical median 
An,m,A- More precisely, we have the following lemma. 

Lemma 6.4. For e = m/(n + m), 

P{An,m,A>M(£,A)}>l/2. 

Proof. For all a G R 

{n n+m ~\ 

J2{Zj<a}+ {Zj <a- A}>(n + m)/2\. 
j=l j=n+l J 

Applying a result of Hoeffding [35], page 805, Inequality 1, we get that, for 
a < A), the right-hand side is bounded by 

P{Bin(n + m,F e;A (a)) > (n + m)/2} < 1/2. □ 

For each fixed A > 0, increasing contamination only increases the popu- 
lation median: 

(6.7) /i(e, A) is an increasing function of e E (0, 1/2). 
Combining the last two observations, we have the MSE lower bound 

(6.8) £An,m,A> M(£o,A) 2 /2, e <m/(n + m). 

7. Proof of Theorem 2.1. We now turn to proofs of our main results. 
In what follows, C stands for a generic positive constant that depends only 
on the relevant function class and the distribution \&; its value may change 
from appearance to appearance. Also, to simplify the notation we use W(i) = 
W[n, h](i), Lh(i) = Lh[Y n ](i) and so on. We also write T\ in place of pLip. 
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7.1. Upper bound. Fix / G T\. Let xi, . . . , xjv G (0, 1) denote the points 
where / is allowed to be discontinuous. Here and throughout the rest of the 
paper we assume that h>l/n. Indeed, if to the contrary h < 1/n, then for 
all % G I n , W(i) = {i} and so TZ n (L h ; f) = a 2 x 1. 

We will demonstrate that 

(7.1) n{L h -F{)<c(h+\ 

Minimizing the right-hand side as a function of h > 1/n gives h n = n -1 / 2 , 
which implies our desired upper bound: 

(7.2) lZ(L h ;Ti)<Cn- 1 ' 2 . 

This upper bound may also be obtained from existing results, because T\ 
is included in a total-variation ball. Indeed, 

||/||sv<2/3 + iV, 

so, in an obvious notation T\ C BV{2f3 + N). From standard results on 
estimation of functions of bounded variation in white noise — [10, 21] — we 
know that 



m£K(L h ;BV(v))<vri 

h>0 



-1/2 



which implies (7.2). Nevertheless, we spell out here an argument based on 
bias-variance trade-off, because this sort of trade-off will be used again re- 
peatedly below. 

Write the mean-squared error as squared bias plus variance: lZ n {Lh] f) = 
B 2 + V, where 

B 2 = -J^(E[L h (i)]-f(i/n)) 2 and V = - £ var[L fc (i)]. 

i=i i=i 

For the variance, since the {Y n (j) :j G I n } are pairwise uncorrelated and 
their variance is equal to a 2 , we have 

a 2 a 2 



vax[L ft (i)] = < 

#>V(^) nh 



Therefore, 



7.3 V<— . 

nh 

For the bias, recall that E[y n (j)] = f(j/n) for all j G I n , so 

Now consider separately cases where i is "near to" and "far from" the dis- 
continuity. Specifically, define: 
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• Near: A = {i G I n : mm s \ i/n — x s \ < h} — in words, A is the set of points 
where the smoothing window does meet the discontinuity. 

• Far: A c = {i G I n : min s \i/n — x s \ > h} — A c is the set of points where the 
smoothing window does not meet the discontinuity. 

Near the discontinuity, use the fact that as / takes values in [0, 1], 

(7.4) \E[L h (i)]-f(i/n)\<l. 

Far from the discontinuity, we apply a sharper estimate that we now 
develop. Let i G A c and consider j G W(i). The local Lipschitz constant 
bound (3 gives 

(7.5) \fUM-f(i/n)\<h sup L x (f)</3h, 

x£[i/n,j /n] 

which implies 

(7.6) \E[L h (i)]-f(i/n)\<ph, i G A c . 
Combining (7.4) and (7.6), we bound the squared bias by 

(7.7) B 2 < *^p 2 h 2 + 

n n 

The number of "near" terms obeys < ^A < N{2nh + 1), and of course the 
fraction of "far" terms obeys #A c /n < 1, so we get 

(7.8) B 2 < f3 2 h 2 + 2Nh + N/n < Ch, 

since h > 1/n; we may take C = (J5 2 + 3N). 

Hence, ^(L^f) < C(h + l/(nh)), and this bound does not depend on 
/ G J 7 !, so (7.1) follows. 

7.2. Lower bound. Let / be the indicator function of the interval [1/2, 1]. 
Then / G T\ for all N > 1 and > 0. 

For the variance, since #W(i) < 3nh, we have 



3nh 

For the squared bias, we show that the pointwise bias is large near the 
discontinuity. For example, take n/2 — nh/2 < i < n/2, so that f(i/n) = 
and therefore 

mm - fHM\ = EiMoi = miy^Jh^m . 

Since #{j G W(i):j/n > 1/2} > nh/2 and #W(i) < 3n/t, the pointwise bias 
exceeds 1/6: 

|E[L h (t)]-/(t/")l>^>l/6- 
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Therefore, 

b2 #{ i eIw :n/2-nV2<i<n/2} 2 
n 

where C = 1 /72 will do. 

Combining bias and variance bounds, we have for any choice of radius h, 

n n (L h ; f)>c(h + ^j> Cn" 1 ' 2 . 

8. Proof of Theorem 2.2. Now we analyze median filtering. The proof 
parallels that for Theorem 2.1 and uses the results of Section 6. Here too, 
we use abbreviated notation. 

8.1. Upper bound. Fix /G T\. Let x\, xjv € (0, 1) be the points where 
/ may be discontinuous. Without loss of generality, we again let h>l/n. 
We will show that 



As in the proof of Theorem 2.1, picking h n = n x l 2 in (8.1) implies our 
desired upper bound, namely: 

inf 1Zn(Mh;Fi) < Cn~ x l 2 . 

h>0 

To get started, we invoke the monotonicity and Lipschitz properties of 
the median (6.1)-(6.2), yielding 

\M h (i) - f(i/n)\ < max \f(j/n) - f(i/n)\ + a\Z(i)\, 
jew(t) 

where Z(i) = Median{Z n (j) :j G W(i)}. 

Near the discontinuity, we again observe that since / takes values in [0, 1], 
we have 

max l/O'/n) - f(i/n)\ <1 

j6W(l) 

for all i G I n , and so 

(8.2) \M h (i)-f(i/n)\<l + a\Z(i)\ \/i G I n . 

Now consider the set A c = {i G I n : min s |i/n — x s | > h} far from the dis- 
continuity. Using (7.5), we get 

(8.3) \M h (i) - f(i/n)\<(3h + o-\Z{i)\ Vi G A c . 
Using (8.2) and (8.3), we get 

(8.4) ^ n (M h ; /) < ^/3 2 h 2 + ^- + - ]>>[Z(i) 2 ]. 

i=i 
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The term on the far right is a variance term, which can be handled using 
Lemma 6.1 at sample size m = #W(i) > nh, yielding variance V < C/(nh). 
The bias terms involve #A and #A C and are completely analogous to the 
case of linear filtering and are handled just as at (7.8), using < #A < 
N(2nh + 1). We obtain TZ n (M h ; f) < C(h + l/(nh)). Since this does not 
depend on f E J-\, (8.1) follows. 

8.2. Lower bound. Let / be the indicator function of the interval [1/2, 1] . 
Surely, / € T\ for any N > 1 and > 0. 
For i £ A c 

\M h (i)-f(i/n)\=a\Z(i)\, 

so that, by Lemma 6.1, 

(8.5) E[(M h (i)-f(i/n)) 2 ]>C±-. 
Therefore, 

i^E[(M h (i)-/(Vn)) 2 ]>C-l. 

«eA c 

For i S A, we view the window as consisting of a mixture of "good" data, 
on the same side of the discontinuity as % together with "bad" data, on the 
other side. Thus 

\M h {i)-f{i/n)\=a\K n {i)\, 
where, with w(i) = #W(i) x nh, and 

= ^{i - J ^ W(i) and on the same side of the discontinuity} 

we have 

K n {i) ~Median{Zi,...,Z p (j) u ,(j),Z 1+p(i ) w (j) + 1/cr, . . . , +l/a}. 

This is exactly a median of contaminated data as discussed in Section 
6.5, with A = 1/(7, m + n = w(i), n = p(i)w(i) and e = 1 — /?(i). Invoking 
Lemma 6.4 with \Xi = p(l — p(i), l/a), applying (6.8) for eq = 1/5 and set- 
ting C = a 2 p 2 (l/5, l/a)/2, we have 

(8.6) E[(M h (i) - /(i/n)) 2 ] > C Vi such that p(i) < 4/5. 
Since #{i : p(i) < 4/5} x n/i, we get 

- E[(M/j(i) - /(«/ ra )) 2 ] > Ch. 

Combining pieces, we get for any choice of h 

Kn(M h ;f)>c(h + ±) >C-n^l\ 
which matches the upper bound. 
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9. Proof of Theorem 4.1. We turn to the setting of Section 4: asymptot- 
ically negligible noise per pixel: a = a n = o(l). 

9.1. Linear filtering. By just carrying the variance term in Section 7 and 
comparing with the no-smoothing rate, we immediately see that 

TZ(L h ; sep-pLip) x a n n~ 1/2 A a 2 n . 

Hence, linear filtering improves on no-smoothing if, and only if, a n n 1 / 2 — > oo. 

9.2. Upper bound for median filtering. We refine the argument from Sec- 
tion 8 in the case where the discontinuities are well-separated. Let = 
SEP-pLip as defined in Section 5. Note that (9.7) will be used in the proof 
of Theorem 5.1. 

Assuming n > 2/rj, choose h G [1/71,77/2). Since median filtering is local 
and the discontinuities are 77-separated, we may assume that / only has 
N = 1 discontinuity point x\. 

Far from the discontinuity, at i G A c , we use (8.2) and Lemma 6.1 to get 

(9.1) n(M h (i) - f(i/n)) 2 } < C (h 2 + ^, ie A c ; 

note that a n is now nonconstant. 

Near the discontinuity, we now take more seriously the viewpoint that the 
window contains "good" data (on the same side of the discontinuity) and 
"bad" data (on the other side) and we bound the MSE more carefully than 
before. 

So take i G A. Define the "good" subset Q{i) — the subset of the window 
on the same side of the discontinuity as i — by 

, v _ /W(i)n [nxi,n], i/n>x 1} 

[y ' y{V ~\w(i)n[i,n Xl ), i/n< Xl . 

Let p(i) = #G{i)/#W(i) and e{i) = 1 — p(i). The window W(i) provides an 
e-contaminated sample in the sense of Huber [18]. 

By the contamination bias bound (6.4), M^(i) lies between the 1 ^~ £ = 
(2p(t)-l)/(2p(t)) and 2?T=iy = V(2p(*)) quantiles of {Y n (j) :j G Q{i)}. Also, 
as in (7.5), we have 

\f(j/n)-f(i/n)\<ph Vi €&(*), 

so the quantiles of these "good data" {Y n (j) :j G Q(i)} are small perturba- 
tions of the quantiles of corresponding zero-median data {Z n (j) :j G G(i)}- 
Let Q n (i) denote the maximum absolute value of the empirical (2p(i) — 
l)/(2p(*)) and l/(2p(t)) quantiles of {Z n (j) :j G ^(t)}. By (6.2) and (6.4) 

(9.3) \M h (i) - f(i/n)\ <(3h + a n Q n (i). 
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We combine (9.3) with (8.2) and Lemma 6.1 to get 

E[(M h (i) - f(i/n)) 2 ] < CE[(1 + a 2 n Z{i) 2 ) A {(3 2 h 2 + a 2 Q n « 2 )] 

< C(l + a 2 n E[Z(i) 2 }) A (/3 2 /i 2 + a 2 n E[Q n (i) 2 }) 

<C(h 2 + lAa 2 n ElQ n (i) 2 }), 

where for the last inequality we used Lemma 6.1 together with h> 1/n, and 
the fact that a A (6 + c) < (a A b) + c for any a,b,c> 0. 

Recalling Section 6.4, let Z miP denote the empirical p-quantile of Z\ , . . . , Z m 
a sample from Because \& is symmetric about 0, |Q n (i)| is stochasti- 
cally majorized by 2|Z m ( i ) >p ( i )|, with m(i) = #G{i) = p(i)#W(i) and p(i) = 
l/(2p(i)). Hence 

(9-4) E[Q n (i) 2 ]<4E[Z^ )ip{ . } ]. 
Using (9.4) and Lemma 6.3, we obtain, for i 6 A, 

E[(M h (i) - /(i/n)) 2 ] < C^ 2 + 1 A a 2 (p«(l - p(*))r 2a+2 ). 
Therefore, 



-2«+2 



(9.5) K n (M h ; f) < C ( h 2 + ^ + - £ 1 A a 2 (p(i)(l - p(i))) 

We focus on the last term on the right-hand side. 
Let 5(i) = \i/n — x±\; since i G A, 6(i) < h. We have 

[r#| + H + l 1 g(t) 
PW= 2H + 1 ^2 +C — • 
So there is a constant C > such that 

(9.6) p(i) < 1 - C5{i)/h ViG A. 

Note that we always have p(i) > 1/2. 
Therefore 

1 E 1 A (p(0(i - p«)r 2a+2 < c- e i a ^(^v/i)-^ 2 



rift 



^ C/i ^E lACT n(V(^))- 2Q+2 



i=l 



< C7t Z" 1 1 A a 2 s" 2a+2 (is = C/ii/„ 
Jo 



with 

a 2 , if C > 3, 

C 7 2 l0g(l/cT n ), if C = 3, 

<?i~\ if C < 3. 



MEDIAN VS LINEAR FILTERING 23 
Combining pieces gives 

(9.7) K n (M h ;f)<c(h 2 + ^ + hv, 

V nh 

Optimizing the right-hand side over h, we get 

n n (M h ; f) < Civ]l 2 V ff V 3 n-V6) • a^ 1 ' 2 = o(a n n~ l l 2 ). 
This bound improves on no-smoothing if a n n — > oo. 

9.3. Lower bound for median filtering. A lower bound is not needed to 
prove Theorem 4.1. However, we will use the following lower bound in the 
proof of Theorem 5.1. 

Let / be the indicator function of the interval [1/2,1]. A lower bound is 
obtained by using the arguments in Section 8.2, but this time carrying a n 
along and noticing that fj,(e, A) is increasing in A. One gets 

(9.8) n n ( Mh -J)>c(hal + ^. 

This bound matches the upper bound, for example, when £ > 3 and cr n n 1//4 — > 
oo. This is the setting that will arise in Section 13. 

10. Proof of Theorem 3.1. We consider two-dimensional linear filtering. 
The structure of the argument parallels the one-dimensional case presented 
in Section 7. The main difference involves counting points near to disconti- 
nuities. 



10.1. Upper bound. We also write Ti in place of cpLip. Fix / G Ti- We 
call 71, . . . ,7jv € r the curves where / may be discontinuous. 
As before, we write MSE = B 2 + V. 

Again, we may assume h > 1/n. Since #W(i) > (nh) 2 , we have V < 
l/(nh) 2 . 

In the two-dimensional case, we define proximity to singularity as follows. 
Write d(A, B) for Haussdorff distance between subsets A and B of the unit 
square: 

• Far: Let A c = {i G I 2 : mm s d(i/n, j s ) > h}. 

• Near: Let A = {i G 1^ : min s d(i/n, j s ) < h}. 

Using the exact same arguments as in Section 7, we obtain the equivalent 
of (7.7): 

B*<*^fh 2 + *^<p 2 h 2 +*^. 
n z n z n z 
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Lemma 16.1 provides an estimate for #A which, when used in the above 
expression, implies B 2 < Ch. 

Thus, 7£ n (L/j; /) <C(h + \/{nh) 2 ). The right-hand side does not depend 
on / e JF 2 , so 

Kn (L ll ^)<c(k + ^). 

Minimizing the right-hand side over h>l/n gives h = n~ 2 ^ 3 , yielding lZ(Lh; 

Ti) < Cn~ 2 /\ 

10.2. Lower bound. Fix < ( < (1/2) A (A/4) and let / be the indicator 
function of the axis-aligned square of sidelength Q centered at (1/2,1/2), 
namely f = l s where S = [1/2 - C/2, 1/2 + C/2] x [1/2 - C/2, 1/2 + C/2]. 
Certainly, / £ J~2- 

Again, the variance V > l/(3nh) 2 . For the squared bias B 2 , we show that 
the pointwise bias is of order 1 near the discontinuity. For example, take 
i £ 1^ such that 

i/n £ [1/2 - C/2 - h/2, 1/2 - C/2] x [1/2 - C/2, 1/2 + C/2], 

so that /(i/n) = and therefore the bias obeys 

\E[L h {i)] - f(i/n)\ = E[L h {i)\ = #W(i) ' 

For such i, #{W(i) n nS} is of order (nh) 2 , since the intersection of the 
disc of radius h centered at i/n with S contains a square of sidelength Ch. 
Therefore, the bias is of order 1 for such i, and there are order n 2 h such i. 
Hence the squared bias B 2 is at least of order h. 
Combining pieces, we get for all h, 

n n (L h ;f) >C-(h+ -r^j > C ■ n 2 l\ 

11. Proof of Theorem 3.2. The structure of the proof is identical to the 
case of one-dimensional signals presented in Section 8. In the details, the 
only significant difference is on computing the number of points away from 
discontinuities. We use the same definitions A and A c as in Section 8. 



11.1. Upper bound. Fix / G Ti- We call 71, . . . ,7at £ T the curves where 
/ may be discontinuous. Again, we may assume h > 1/n. 

Define A c = {i € 1^ : min s d(i/n, j s ) > h}. Using the exact same arguments 
as in Section 8, we obtain the equivalent of (8.4): 

n n {M h - f) < *^p 2 h 2 + + -L £ E[z ( i) 2 ]. 
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Lemma 16.1 provides an estimate for #A which, when used in the above 
expression, leads to TZ n {L h ; f) < C(h + l/(nh) 2 ). From there we conclude as 
in Section 10.1. 

11.2. Lower bound. Let / € T% be the indicator function of a disc D. 
For i S A c , the equivalent of (8.5) holds and together with Lemma 16.1 
implies 

^£ E| (M t (i)-/(i/„)f]>C^. 

The equivalent of (8.6) holds as well and implies 

- 2 £ E[(M,(i) - f{\/n)f) > C- 2 #{i : p(i) < 4/5}. 
n ieA n 

We now show that, for i/n G £> c such that <5(i) > 1/n, p(i) < l/2 + C5(i)/h. 
Let y € <9.D be the closest point to i/n and L the tangent to <9-D at y. L 
divides B(i/n,h) into two parts A and B(i/n,h) CiA c , where i/n S A. We 
have ^4 = Aq U i? , where H is the open half disc with diameter parallel to 
L that does not intersect dD. We have Q(i) = I 2 nA, so that #G(i) < 
#W(i) /2 + n n^o). j4o is contained within a rectangular region with 
dimensions 5(i) by 2/i and for any rectangular region, \R\ < C\R\n? + 0(nh). 
Hence, since nS(i) > 1, 

#(I 2 n nA ) < #(I 2 n nR) < CM(i)n 2 . 

Therefore, p(i) < 1/2 + CS(i)/h. 
We thus have 

#{i : p(i) < 4/5} > #{i : 1/n < 5(i) < Ch}. 
Let K = \{x: d(x, dD) < Ch}\. We have nh > 1 so that 

iTc |J B(i/n,2/n), 

which implies < C^{i:J(i) < Ch}/n 2 . By elementary calculus, |i^| > 
Ch, so 

#{i:p(i)< 4/5} >Cn 2 /i. 

We obtain for all h 

n n {M h -J)>c(^ + h)>Cn-^. 

12. Proof of Theorem 4.2. We consider again median filtering in the 
negligible-noise-per-pixel case of Section 4, this time in the two-dimensional 
setting. 
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12.1. Linear filtering. By just carrying the variance term in Section 10 
and comparing with the no-smoothing rate, we immediately see that 

TZ(L h ; sep-cpLip) x o- 2/3 n~ 2/3 A o 2 n . 

Hence, linear filtering improves on no-smoothing if, and only if, a n n 1 ' 2 — > oo. 

12.2. Upper bound for median filtering. We refine our arguments in the 
setting of asymptotically negligible noise level a = a n = o(l) with well- 
separated discontinuities. Let = sep-cpLip as defined in Section 5. Note 
that (12.1) will be used in the proof of Theorem 5.2. 

Letting n > 2/rj, choose h E [1/71,77/2). Since median filtering is local and 
the discontinuities are at least 77 apart, we may assume that N = 1, namely 
that / only has one discontinuity curve 7 E T + . Being a Jordan curve, 7 
partitions [0, l] 2 into two regions, the inside (fi) and the outside (f2 c ). 

We proceed as in Section 8, introducing <5(i) = d(i/n,7) and 



9® 



fW(i)nfi, ifi/nEfi, 
\W(i)nO c , ifi/raEft c , 



together with p(i) = #0(i)/#W(i) and p(i) = l/(2p(i)). 

Using the exact same arguments as in Section 8, we obtain the equivalent 
of (9.5): 



2 r 
Ot 1 



Wn(Af h; /) < c[h 2 + ^ + ^ 53 1 A o-i( P m-p(i))r 2a+2 

We bound the last term on the right-hand side by 



' 4 ieAnA- " ieAi 



where Af = {i E I 2 : 8(i) > 2{C\h 2 + ti -1 )}, the constant C\ > being given 
by Lemma 16.2 — the second term in this last expression represents the bias 
due to the curvature of the discontinuity. 
For £ = 0, . . . , [nh] , define 

H^ = {iEl 2 :£<n5(i) <£+l}. 

We use Lemma 16.4 to get 

j ^ 2n(C*ih 2 +n- 1 ) ^ 2n(Cih 2 +n- 1 ) 



E *^ c ~ n E i<^ 2 vO. 
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We use Lemmas 16.2 and 16.4, and replicate the computations below (9.6) 
to get 

-. 1 nh 

- J2 iAal(p(i)(i- P (i))r 2a+2 <c-J2#z i -(iA* 2 n (i/(nh)y 



-2a+2\ 



-. nh 

<^^l.(lA^(£/K))- 2a+2 ) 



< Chv n . 



Combining inequalities, 



2 



(12.1) nn ( Mh -J)<C[h z + -r^ + hv n 

Optimizing the right-hand side over h, we get 

Tl n {M h ; f) < CivT V aWnW) ■ a^n~^ = o(a^n^). 
This bound improves on no-smoothing if a n n — > oo. 

12.3. Lower bound for median filtering. This is not needed to prove The- 
orem 4.2. However, we will use the following lower bound in the proof of 
Theorem 5.2. 

Let / be the indicator function of a disc D such that / G ■ The low- 
noise-per-pixel case comes again from carrying a n along, yielding 

(12.2) n n {M h -f)>c(ha 2 n + ' 



{nhf 

This bound matches the upper bound, for example, when C, > 3 and a n n 1 ^ 
oo. This is the setting that will arise in Section 14. 

13. Proof of Theorem 5.1. 



13.1. Upper bound. Without loss of generality, fix a = 1. Let 1/n < hi < 
hi < 1 to be chosen later as functions of n. We only need consider hi » 1/n, 
for otherwise the first pass does not reduce the noise level significantly. Also, 
for simplicity we assume that both ni = h^ 1 and nhi are integers. 

Fix / G Ti . Again, we may assume that N = 1 without loss of generality. 
Call xi G (0, 1) the point where / is discontinuous and let 

{k 1 } = {k:x 1 e(Bfr/n)}. 

Using (6.1)-(6.2), we have 

Yt (k) = f(khi) + (AO + Vt (AO, 
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where 

7h\ / jl\ 1\/T„J;„„ f <7 I „"\ . „' r- d' 1 



Z^(k) = Median{Z n (i) : i G B* 1 }, 
Vt(k)\ < U^(k) = max \f(i/n) - f(khi)\. 



Since for k 7^ k% , / is locally Lipschitz in B^ 1 , we have 



|^)| <{^, 



Let a' n = (2nh 1 + l)- 1 / 2 an d define Z' n {k) = Z^{k)/a' n . The Z£()fe)'s are 
independent and identically distributed, and by Lemma 6.2, their distribu- 
tion ty' n satisfies [Shape] and [Decay] with C(^n.) — 4 and implicit constant 
independent of n. Define Y^(k) = f(khi) + a' n Z' n (k). For i G B k x , we have 

\M h ^[Y n ](i)-f(i/n)\ 

<\M h2 [Y^](k)-M h2 [Y^](k)\ 

+ \M h2 [Y^](k) - f(kh!)\ + \f{kh!) - f(i/n)\ 
<\M h2 [Y>](k) -f{kh x )\ +2\U^(k)\. 

Hence, using the bounds on U^ 1 (k), we have 

lZ n {M h ^-f) = ^ J2 E E[(M^[Y n ](z)-/(*/n)) 2 ] 



<c-j2 n(M h2 K](k) - f(k/ ni )) 2 ]+u^(kf 



U1 k& nl 

<C-J2 E[(A^K](fc)-/(^ 1 )f] + C/ ll . 

nl fcelft! 

Using the upper bound (9.7) on the first term, which we may use since we 
are back to the original situation, we get 

TZ n (M h ^;f) <c(h 2 2 + + h 2 (a' n ) 2 ) + Chv 

V nih 2 J 

We then replace a' n by its definition (2nh\ + l)" 1 / 2 and minimize over h\ 
and /i2> with h\ = n~ 2 ^ 3 and h 2 = n _1//3 , and obtain the desired upper bound 
valid for any / G . 
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13.2. Lower bound. Fix h\ < h 2 . We again assume for convenience that 
n\ = 1/hi and nh\ are integers. Let / be the indicator function of the interval 
[t, 1], where t is the middle of the unique interval of the form [kh±, (k + 
containing 1/2. By the definition of /, 

— kl , ' ' G [1/3,2/3]. 

#< 

Hence, because M hl ' h2 [Y n ](i) = M h ^ h2 [Y n ](j) for all i, j G B%, 

- V E[(M ?il ^[y n ](i)-/(i/n)) 2 ]>C^ = C/i 1 . 
fc i 

Now, because l^ 1 (fc) = if fc / fci, we have 
i 53 E[(M^[F n ](i) - /(z/n)) 2 ] = J- 2 E[(M, 2 [KP) - /(fc/m)) 2 ]. 



We then use (9.8), which applies the same here even though we omit k = k\. 
Combining the cases k = k\ and k ^ k\ , we get 



TZ n (M h ^-f) >C( ^ + h 2 {a' n f )+Ch 1 . 

We conclude by noticing that the right-hand side is larger than ?t,~ 2 / 3 for all 
choices of h\ <h 2 . 



14. Proof of Theorem 5.2. 



14.1. Upper bound. We follow the line of arguments in Section 13. 
Fix / G T 2 . Again, we may assume that N = 1 without loss of generality. 
Call 7 G r + the curve where / is discontinuous and let 

K 1 = {k: 7 n(4 tl /n)/0}. 

Here too, \U^(k)\ < (ih x for k ^ K x and \U^(k)\ < 1 for k G K x . Also, 
#-Ki < Cnfhx, which comes from the fact that k G K\ implies <5(k) < y/2h\ 
and the application of Lemma 16.1. 

Using these facts and following the exact same arguments as for the one- 
dimensional case, we get 

U n {M h ^-f)<c\ 53 E[(M^K](k)-/(k/m)) 2 ] + C^. 
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Using the upper bound (12.1) on the first term, we get 

TZ n (M h ^;f) <c(hj+ + h 2 {a' n ) 2 ) + Chv 

Note that here a' n x (nh\)~ l . We then minimize over h\ and h 2 , with hi = 
n~ 6 / 7 and h 2 = n~ 4 / 7 , and obtain the desired upper bound valid for any 

14.2. Lower bound. The proof is completely parallel to the one-dimensional 
case, this time using (12.2). 

15. Variability of quantiles. 

Lemma 15.1. Assume ^> satisfies [Shape] and [Decay]. Then for all 
a\ < C/(C ~~ 1) < a 2, there are positive constants C±,C 2 such that 

d(p(l - p))~^ < ^~\p) < C 2 (p(l - p)y a2 Vp G (0, 1). 

Moreover, if ip(x)x^ x 1, then 

A*-i(p) x (p(i_ p ) r « 
dp 

where a = C/(C ~ !)■ 

Proof. Let 1 < s < ( < t such that ct\ < s/(t - 1) < t/(s - 1) < a 2 . We 
have 

A 2 (l + |x|) _t < ip{x) < Ai(l + |x|)~ s VxGl. 
By integration, we also have 

B 2 (l + |x|)~ m < 1 - *(a?) < Bi(l + |x|)~ s+1 Vx > 0. 
Therefore, 

x (l - ^(x))* 7 ^- 1 ) < V(x) < Cf *(1 - y{x)) s/{t - 1] Vx > 0. 
By symmetry, we thus have 

C 2 1 (^>(x)(l-^(x))) t/{s - 1) < z/>(x) < Cf 1 (*(z)(l-*(z))) a/(t ~ 1) Vx G M. 
This is equivalent to 

cmi-p))-'^ < ^>-\p)<c 2 {p{i- P ))- t ^ v p g (o,i). 

For the last statement, follow the same steps. □ 
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Lemma 15.2. Let satisfy [Shape] and [Decay]. Then for all ct\ < 
C/(C ~~ 1) < a 2, there are positive constants Ci,C2 such that 

Ci(p(i -p)y ai+1 < v-Hp) < c 2 (p(i - p))~ a2+1 v P e (o, l). 

Moreover, if i()(x)x^ x 1, then 

*- l (p)~(p{l-p))- a +\ 

where a = C/{Q — 1)- 

Proof. Integrate the result in Lemma 15.1. □ 

15.1. Proof of Lemma 6.2. Proof. Assume £ is odd for simplicity and 
let £ = 2m + 1 . We have 

*2m+iO) = (B m o ^)(x/ v / 2m + l), 

where (see, e.g., [30]) B m is the /^-distribution with parameters (m, m): 

(2m + 1)! [V 

B m(y) = (m , )2 j o («(i - «)) 

Given that ^ has a continuous density ^ and B m is continuously differ- 
entiable, Vl/2m4-i has a continuous density given by 

^2m+i(x) = J—- ^(x/V2m~+T) ■ (B' m o *)(x/V2^+l). 
V 2?n + 1 

Moreover, since ip is unimodal and symmetric about and B' m is uni- 
modal and symmetric about 1/2, ip2m+i is unimodal and symmetric about 
0. Therefore, ^m+i satisfies [Shape], ^m.+i also satisfies [Decay] since 
i/j 2m +i(x) < C m 7p(x/V2m + 1). 

We now show that there is a constant C such that, for m large enough, 
tp2m+i(x)(l + M) 4 < C for all x. It is enough to consider x > 0, which we 
do. Fix s £ (1,0- Using Stirling's formula and the fact that ip is bounded, 
we find C such that 

4>2m+i(x) < C(W(x/V2m~+l)(l - *(x/V2m~+T))) m . 

In particular, ijj2m+i(x) < C for all x. Since as £ — > oo, 

there is xq > such that 1 — 'I'(x) < (1 + x) _s+1 /4 for x > xq. Now, for 
x < x , (1 +x) 4 V'2m+i( 2; ) < C(l +^o) 4 ; for x > x , 

(i + x)W 1 W<C (l + ^ H j 4 l)( ,_ 1 , m . 

By elementary calculus, as soon as (s — l)m > 4\/2m + 1, which happens 
when m is large enough, the right-hand side is bounded by its value at xq,, 
which is also bounded by C(l + xq) 4 . □ 
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16. Some properties of planar curves. This section borrows notation 
from Sections 10 and 11. 

Lemma 16.1. For 7 G L 2 (A) andh>l/n, #A<Cn 2 h, C = 20(A + 1). 

Proof. Assume 7 is parametrized by arclength. Let = h/2 + kh, for 
k = 1, [length (7) /h]. By the triangle inequality, for each x G [0, l] 2 in 
the ^-neighborhood of 7, there is k = 1, . . . , [length(7)//i] such that x and 
7(sfc) are within distance 2h. Each ball centered at 7(sfc) and of radius 2h, 
h > n~ , contains at most 20n 2 h 2 gridpoints. Therefore, the /i-neighborhood 
of 7 contains at most [length (7 )/h] ■ Cn 2 h 2 < C(A + l)n 2 h gridpoints. □ 

Lemma 16.2. There are constants ho,C±,C > such that, if h < h$, 
then for all i G A satisfying 6(1} > 2(Cih 2 + n~ l ), p(i) < 1 - C5{\)/h. 

Proof. Let ho be defined as in Lemma 16.3 and assume h < h^. Take 
x so that jf]B(x,h) 7^ 0, where B(x,h) is the disc of radius h centered at 
x. By Lemma 16.3, there are arclengths si < S2 such that 

7 n B(x, h) = {7(5) : si < s < s 2 }. 

A Taylor expansion of degree 2 gives 

| 7 (i) - 7 (a) - (i - 8 )Y(a)| < «/2(t - s) 2 Vs, t G [0, length (7)]. 

Together with the triangle inequality and the fact that |7'(s)| = 1 for all s 
and (7(52) — 7(si)| < 2h, this implies 

»2- si<k/2(s 2 - si) 2 + 2h. 

Therefore, there is C > such that s 2 — si < Ch. Applying this Taylor 
expansion twice also implies 

7(a) - 7(si) - (a - a x 

«2 - Si 

which now becomes 



<C x h 2 VsG[s!,s 2 ] 

for some constant C\ > 0. This means that, for all s G [si,S2]> 7(s) is within 
distance C\h 2 from the segment joining 7(si) and 7(^2). Let L be the line 
parallel to, and at distance C\h 2 from [7(^1), 7(s2)], that is, closest to x. 
The line L divides B(x,h) into two parts A and B(x, h) n A c , where x G A. 
Since we have d(x,L) > d(x,j) — d(L,j) = d(x,j) — C\h 2 , if d(x,j) > C±h 2 , 
A n 7 = and ^4 contains the closed half disc with diameter parallel to L 
that contains x. 



<k(s 2 -si) 2 VsG[si,s 2 ], 



7(s) -7(si) - (s-si) 



S2 - Si 
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Now, let x be of the form i/n with i G I 2 with 2Ci/i 2 + 2n~ x < <5(i) < ft,/2. 
Without loss of generality, assume that nh > 2. 

By symmetry, all open half discs of B(i/n,h) contain the same number 
of gridpoints, so that any closed half disc of B(i/n,h) contains more than 
half of the gridpoints within B(i/n,h). Let A = Aq U H, where H is the 
closed half disc with diameter parallel to L that does not intersect 7. We 
have Q{\) = I 2 HnA, so that #£(i) > #W(i)/2 + #(I 2 nnA ). A cont ains a 
rectangular region R with dimensions d(i/n,L) by 2^/K 1 — d(i/n, L) 2 , with 
d(i/n, L) = <f(i) - C1/1 2 > 2/n and 2^/j 2 — d(i/n,L)' 2 > v^/i > 2/n. For such 
a rectangular region, with sidelengths of at least 2/n, 



It follows that p(i) > l/2 + C<5(i)//t, which in turn implies < 1 — C<5(i) //t. 
We proved this for i such that <5(i) < h/2; however, this obviously extends 
to i such that S(i) < h with possibly a different constant C. □ 

Lemma 16.3. There is a constant ho > such that the following holds 
for all 7 G r + . If h < ho and x E [0, l] 2 are such that 7 n B(x, h) ^ 0, then 
there are arclengths si < S2 such that 7 n B(x, h) = {^{s) :s\ < s < S2}. 

Proof. We assume 7 is parametrized by arclength and consider ar- 
clengths modulo length(7). 

Take x G [0, l] 2 and h > such that jf]B(x,h) / 0. If 7 C B(x, h), then 7 
has maximum curvature bounded below by /i" 1 . We arrive at the same 
conclusion if jDdB(x, h) has infinite cardinality, for then jndB(x, h) would 
have at least one accumulation point (since it is compact) at which the 
cu rvature would be exactly h . Suppose h < k _1 so that 7 is not included 
in B(x,h) and "fPidB(x,h) is nonempty and finite. Consider the set of 
arclengths s with the property that there exists Eo > such that, for all 
< e < £0, j(s — e) G B(x, h) and j(s + e) ^ B(x, h); this set is discrete, and 
therefore of the form {0 < s\ < • • • < s m < length^)}. Note that because 7 
is closed, m is even. 

Define s m+ \ = length (7). We may assume that s\ = and 7(g) G B(x, h) 
for all s G [si,S2]- Then, for all k = 1, . . . ,m/2, 7(5) ^ B(x,h) for all s G 

{s2k,S2k+l)- 

Because 7 G T + , we have, for all k = 1, ... , m/2, 



RC (J B(i/n,2/n) 



i/n£R 



so that 



#(I 2 n n,4o) > #(In n nR) > C\R\n 2 > Ch5{\)n 2 . 



|7(s2fc+l) ~7(«2fc)| 



•S2fc+1 — S2k 



< 6» and 



length(7) - s 2 fc+i + §2fc 

|7(*2*+l)-7(«2Jfc)| 
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Suppose m > 2, so that m > 4. If S3 — S2 < length (7) — S3 + S2, then S3 — S2 < 
9\^(s3) — 7(52)! < 29h. Otherwise S3 — S2 > length^) — S3 + S2, and since 
si < S2 < S3 < s m , this implies that length(7) — s m + si < s m — s\ and so 
length (7) — s m + s\ < 6\ n f{s m ) — 7(si)| < 29h. This is in turn equivalent to 
s m+ i — s m < 6\j(s m+ i) — 7(s m )| < 29h. In both cases, there is k = 1, ... , m/2 
such that S2fc+i — S2k < \l{s2k+i) ~ 7( s 2fc)l — 28h. Fix such a k. Let a be the 
angle between 7'(s2fe) and 7'(s2fc+i) and let 6 be the angle between [x, 7(s2fe)] 
and [x,7(s2fe+i)]- We have 

cos(a) = (7 (s 2 fc),7 («2fc+i)) = 1 



2 

with |Y(s2fc) - 7'(s 2 fc+i)l < K («2fc+i - ^2fc) < Ik I Oh. Suppose h < (y/2K9)~ 1 , 
so that a < C\{s2k+i — $2k), where C\ = C±(k, 9). We also have 

■f h/0 \ _ Kfgfc) -7(«2fc+l)| . g2fc+l ~ g2fc 

sm(ft/2j- ^ > m , 

so that b > C2(s2fc+i — S2k)/h, where C2 = C^fft, 9). Now, because 7 ; (s2fc) is 
either tangent or pointing outward and 7'(s2fc+i) is either tangent or pointing 
inward with respect to B(x,h), we have a > b, if they are both tangent to 
B(x,h), a = b. Therefore, h> C2/C1. 

We thus let h$ = Hq(k,9) be the minimum over all the constraints on h 
and C2/C1. □ 

Lemma 16.4. There is a constant C > such that, for all h > and 
^ = 0, . . . , nh, 

W^t < Cn. 



Proof. For i e Hg, B(i/n, l/(2n)) C T, where T = — 1 < nd(x,j) < 
£ + 2}. Since those balls do not intersect, we have ^H^Ci/n 2 < \T\, C\ = it/A. 

As in the proof of Lemma 16.1, assume 7 is parametrized by arclength. 
Let Sfc = l/(2n) + k/n, for k = 1, . . . , [nlength(7)]. Let n(s) be the normal 
vector to 7 at 7(5) pointing out. Define = 7(sfc)± (£/n)n(sk). Take i£T, 
say outside of 7; it is of the form 7(5) + an(s) , with a £ [{£ — l)/n, (£ + 2)/n] . 
Let be such that |s — s&| < 1/n. By the triangle inequality, we have 

\x - x~l\ < \j(s) - 7(s fc )| + \a- £/n\ + |n(s) - n(s k )\ 

< \s — Sfc|(l + k) + I a — £/n\ 

< C 2 /n, C 2 = 3 + k; 

here we used \n(s) — n(sjfc)| = IV (s) — l'(sk)\ < «|s — Sfc|. Therefore, T C 
Ufc-B(Xfc,C7 2 /n), so that \T\<n- length (7) • C 2 /n 2 < C 2 ■ length(7)/n. 
In the end, we have < C\\T\n 2 < C\ ■ C2 ■ length(7) • n. □ 



MEDIAN VS LINEAR FILTERING 



35 



REFERENCES 

[1] Anonymous. (2007). Median filter. Wikipedia. 

[2] Barner, K. and Arce, G. R. (2003). Nonlinear Signal and Image Processing: The- 
ory, Methods, and Applications. CRC Press, Boca Raton, FL. 
[3] Bottema, M. J. (1991). Deterministic properties of analog median niters. IEEE 

Trans. Inform. Theory 37 1629-1640. MR1 134302 
[4] Brandt, J. (1998). Cycles of medians. Util. Math. 54 111-126. MR1658177 
[5] Candes, E. J. and Donoho, D. L. (2002). Recovering edges in ill-posed inverse 

problems: Optimality of curvelet frames. Ann. Statist. 30 784-842. MR1922542 
[6] Candes, E. J. and Donoho, D. L. (2002). Recovering edges in ill-posed inverse 

problems: Optimality of curvelet frames. Ann. Statist. 30 784-842. MR1922542 
[7] Cao, F. (1998). Partial differential equations and mathematical morphology. J. Math. 

Pures Appl. 77 909-941. MR1656780 
[8] Caselles, V., Sapiro, G. and Chung, D. H. (2000). Vector median filters, inf- 

sup operations, and coupled PDEs: Theoretical connections. J. Math. Imaging 

Vision 12 109-119. MR1745601 
[9] Donoho, D. L. (1999). Wedgelets: Nearly minimax estimation of edges. Ann. 

Statist. 27 859-897. MR1724034 
[10] Donoho, D. L. and Johnstone, I. M. (1998). Minimax estimation via wavelet 

shrinkage. Ann. Statist. 26 879-921. MR1635414 
[11] Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1995). 

Wavelet shrinkage: Asymptopia? J. Roy. Statist. Soc. Ser. B 57 301-369. 

MR1323344 

[12] Donoho, D. L. and Yu, T. P.-Y. (2000). Nonlinear pyramid transforms based on 
median-interpolation. SI AM J. Math. Anal. 31 1030-1061. MR1759198 

[13] Fan, J. and Hall, P. (1994). On curve estimation by minimizing mean absolute 
deviation and its implications. Ann. Statist. 22 867-885. MR1292544 

[14] Gu, J., Meng, M., Cook, A. and Faulkner, M. G. (2000). Analysis of eye tracking 
movements using fir median hybrid filters. In ETRA '00: Proceedings of the 2000 
symposium on Eye tracking research and applications 65-69. ACM Press, New 
York. 

[15] Guichard, F. and Morel, J.-M. (1997). Partial differential equations and image 
iterative filtering. In The State of the Art in Numerical Analysis (York, 1996). 
Inst. Math. Appl. Conf. Ser. New Ser. 63 525-562. Oxford Univ. Press, New 
York. MR1628359 

[16] Gupta, M. and Chen, T. (2001). Vector color filter array demosaicing. In Sensors 
and Camera Systems for Scientific, Industrial, and Digital Photography Appli- 
cations. II (M. B. J. C. N. Sampat, ed.). Proceedings of the SPIE 4306 374-382. 
SPIE, Bellingham, WA. 

[17] Hamza, A. B., Luque-Escamilla, P. L., Martinez- Aroza, J. and Roman- 
Roldan, R. (1999). Removing noise and preserving details with relaxed median 
filters. J. Math. Imaging Vision 11 161-177. MR1727352 

[18] Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. 
Statist. 35 73-101. MR0161415 

[19] JuSTUSSON, B. (1981). Median filtering: Statistical properties. In Two-Dimensional 
Digital Signal Processing. II (T. S. Huang, ed.). Topics in Applied Physics 43 
161-196. Springer, Berlin. MR0688317 

[20] Koch, I. (1996). On the asymptotic performance of median smoothers in image 
analysis and nonparametric regression. Ann. Statist. 24 1648-1666. MR1416654 



36 



E. ARIAS-CASTRO AND D. L. DONOHO 



[21] Korostelev, A. P. and Tsybakov, A. B. (1993). Minimax Theory of Image Re- 
construction. Lecture Notes in Statistics 82. Springer, New York. MR1226450 

[22] Mallows, C. L. (1979). Some theoretical results on Tukey's 3R smoother. In 
Smoothing Techniques for Curve Estimation (Proc. Workshop, Heidelberg, 
1979). Lecture Notes in Math. 757 77-90. Springer, Berlin. MR0564253 

[23] Mason, D. M. (1984). Weak convergence of the weighted empirical quantile process 
in L 2 (0, 1). Ann. Probab. 12 243-255. MR0723743 

[24] Matheron, G. (1975). Random Sets and Integral Geometry. Wiley, New York. 
MR0385969 

[25] Morel, J.-M. and Solimini, S. (1995). Variational Methods in Image Segmentation. 

Birkhauser Boston, Boston, MA. MR1321598 
[26] Mumford, D. and Shah, J. (1989). Optimal approximations by piecewise smooth 

functions and associated variational problems. Comm. Pure Appl. Math. 42 577- 

685. MR0997568 

[27] Perona, P. and Malik, J. (1990). Scale-space and edge detection using anisotropic 
diffusion. IEEE Trans. Pattern Anal. Mach. Intell. 12 629-639. 

[28] Piterbarg, L. I. (1984). Median filtering of random processes. Problemy Peredachi 
Informatsu 20 65-73. MR0776767 

[29] ROUSSEEUW, P. J. and Bassett, G. W. Jr. (1990). The remedian: A robust averag- 
ing method for large data sets. J. Amer. Statist. Assoc. 85 97-104. MR1137355 

[30] Rousseeuw, P. J. and Bassett, G. W. Jr. (1990). The remedian: A robust averag- 
ing method for large data sets. J. Amer. Statist. Assoc. 85 97-104. MR1 137355 

[31] Sapiro, G. (2001). Geometric Partial Differential Equations and Image Analysis. 
Cambridge Univ. Press, Cambridge. MR1813971 

[32] Semmes, S. W. (1988). Quasiconformal mappings and chord-arc curves. Trans. Amer. 
Math. Soc. 306 233-263. MR0927689 

[33] Serra, J. (1982). Image Analysis and Mathematical Morphology. Academic Press, 
London. MR0753649 

[34] Sethian, J. A. (1999). Level Set Methods and Fast Marching Methods, 2nd ed. Cam- 
bridge Monographs on Applied and Computational Mathematics 3. Cambridge 
Univ. Press, Cambridge. MR1700751 

[35] Shorack, G. R. and Wellner, J. A. (1986). Empirical Processes with Aplications 
to Statistics. Wiley, New York. MR0838963 

[36] Stranneby, D. (2001). Digital Signal Processing: DSP and Applications. Oxford 
Univ. Press, London, UK. 

[37] Tukey, J. W. (1977). Exploratory Data Analysis. Addison- Wesley, Reading, MA. 

[38] Velleman, P. and Hoaglin, D. (1981). Applications, Basics, and Computing of 
Exploratory Data Analysis. Duxbury, North Scituate, MA. 



Department of Mathematics 
University of California, San Diego 
9500 Gilman Drive 
La Jolla, California 92093-0112 
E-MAIL: eariasca@math.ucsd.cdu 



Department of Statistics 
Stanford University 
390 Serra Mall 

Stanford, California 94305-4065 
E-MAIL: donoho@stat.stanford.edu 



